Lookup Join Types
FastStats Designer supports 4 different join types for lookup tables:
Multi Parent Record (join at build time)
Join at Build Time is the traditional way FastStats has handled multi parent lookups. An index is created at build time and as the parent table is read the relevant row is found in the lookup table and output as a record on the parent table. This works well if the lookup table is narrow (a few Selector columns) but can be inefficient if the lookup is wide.
Multi Parent Record (join at query time)
Join at Query Time is an alternative way of processing multi-parent lookup tables. At build time a 32 bit integer key is added to the parent table and this provides the row number of the lookup table. The data is retrieved from the lookup at query time. This can reduce the disk space required if the lookup table is wide. A typical use of this join strategy would be to link a wide 'Product Details' lookup to an 'Order Line' table.
Lookup tables loaded in this manner cannot have any more than 8.3m rows (8 * 1024 * 1024). If they do then FastStats Designer use the 'Join at Build Time' strategy to load this lookup.
Single Parent Record Lookups
Single Parent Record indicates that only each lookup row is only retrieved once. This knowledge enables FastStats to use a sorting strategy rather than index strategy which can increase load performance.
Single Parent Record Lookup (SPL) Examples
In the Booking-Destination example the same Destination details are retrieved many times for each matching booking. Therefore records in the Destination table have multiple parent records (Single Parent Record should be unchecked).
Another lookup might contain extra information to append to the customers table for “gold club” accounts. In this case each lookup record would only be retrieved once. In this case Single Parent Record should be checked.
A good clue that a lookup should be marked as a SPL is if the key on the lookup table joins to a reference field on the parent table. Assuming there are no duplicate references (if you have duplicate references you will be warned of this at the end of the build).
How do I test to see if my source data is suitable for a single parent record lookup?
If your input data is in a database you can use a SQL query similar to this example to check that the lookup records are only matched once.
SQL
SELECT MAX(matched_rows) FROM
( SELECT COUNT(p.URN) matched_rows FROM People p,PeopleDetails pd WHERE p.URN = pd.URN GROUP BY p.URN ) matched_rows
This query will return the maximum number of matching records between People and PeopleDetails - if this result is 1 then the PeopleDetails table is suitable to be marked as a SPL. If it is greater than 1 then it must be a normal lookup.
In some cases FastStats Designer will be able to detect if a MPL lookup could be a SPL lookup. In this case you may see the following entry in the build log:
"Only single matches detected on lookup so this table could be implemented as single parent lookup"
Single Parent Record Lookup Types
Single Parent Record Lookups can be loaded in one of two ways:
1, Single Parent Record
This uses a sort strategy to perform the join and builds the data into a transaction table. At query time the appropriate record is loaded from the transaction table. This means that if the lookup does not match to all the records in the parent table then it will take less space.
2, Single Parent Record (merged onto parent table)
This uses a sort strategy to perform the join and builds the data onto the parent table (like an 'MPL at build time' does). This can result in improved query performance for queries that involve the lookup fields but can waste a large amount of disk space if the lookup is sparsely populated.
Summary
Lookup Type |
Advantages |
Disadvantages |
MPL (join at build time) |
Saves space if table is narrow (<40 bits wide). Less memory required at query time. |
Wide lookups can waste space and take longer to build. |
MPL (join at query time) |
Saves space and build time if lookup records are wide. |
May require 64 bit Designer and 64 bit Service to allocate enough memory to hold them. Limited to 10M rows. |
SPL |
Saves space if table is sparse. |
Possible reduction in query performance. |
SPL (merged onto parent table) |
Maintains query performance and saves space if lookup is fully populated. |
Wastes space and can be slower if table is sparsely populated. |